Layering and Merging Linguistic Annotations
نویسندگان
چکیده
The American National Corpus and its annotations are represented in a stand-off XML format compliant with the specifications of ISO TC37 SC4 WG1’s Linguistic Annotation Framework. Because few systems that enable search and access of the corpus currently support stand-off markup, the project has developed a SAX like parser that generates ANC data with annotations in-line, in a variety of output formats.
منابع مشابه
Integrating Linguistic Resources: The American National Corpus Model
This paper describes the architecture of the American National Corpus and the design decisions we have made in order to make the corpus easy to use with a variety of existing tools with varying functionality, and to allow for layering multiple annotations over the data. The overall goal of the ANC project is to provide an “open linguistic infrastructure” for American English, consisting of as m...
متن کاملTechnical Report: Adjudication of Coreference Annotations via Answer Set Optimization
We describe the first automatic approach for merging coreference annotations obtained from multiple annotators into a single gold standard. This merging is subject to certain linguistic hard constraints and optimization criteria that prefer solutions with minimal divergence from annotators. The representation involves an equivalence relation over a large number of elements. We use Answer Set Pr...
متن کاملBy all these lovely tokens... Merging Conflicting Tokenizations
Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday’s NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different ...
متن کاملLayering in structural-functional grammars
This article presents an overview of the notion of layering in three types of structuralfunctional grammars. Layering is interpreted in a broad sense to refer to two types of linguistic differentiations: (1) on the one hand, a distinction between levels of coding, namely, syntax/grammar, semantics, pragmatics; (2) on the other hand, a differentiation between speaker-related vs. content-related ...
متن کاملThe Linguistic Annotation Framework: a standard for annotation interchange and merging
This paper overviews the International Standards Organization Linguistic Annotation Framework (ISO LAF) developed in ISO TC37 SC4. We describe the XML serialization of ISO LAF, the Graph Annotation Format (GrAF) and discuss the rationale behind the various decisions that were made in determining the standard. We describe the structure of the GrAF headers in detail and provide multiple examples ...
متن کامل